Skip to main content

What’s Changing?

Playlab is strengthening its safety and moderation systems to give educators and administrators better visibility into student interactions. These updates focus on three areas: improved moderation accuracy, org-level notification controls, and clearer tools for reviewing flagged content. These changes are designed to help organizations feel confident that Playlab is a safe environment for students, without turning Playlab into a full incident management system. The goal is notification and attention management so admins know what needs their attention and can act on it quickly.

Moderation Improvements

Playlab’s moderation system automatically detects and hides potentially inappropriate AI messages from being shown to users. The team is actively refining how moderation works to reduce disruptive false positives (messages that shouldn’t have been flagged) and improve clarity. Ongoing improvements include refining moderation categories to be more intuitive and clearly labeled, checking app prompts for usage policy violations before they turn into problematic conversations, providing additional context from surrounding messages to improve moderator understanding, the ability to provide feedback in activity view about moderated messages, and a data annotation pipeline to train more accurate and nuanced moderation models. As these improvements roll out, you may notice fewer false flags and more accurate categorization of moderated content in our new moderation digest emails.
Moderation is still in active development. If you encounter a moderation decision that seems incorrect, please let us know at support@playlab.ai.

Org-Level Moderation Notifications

Organization admins now receive email notifications when moderated content is detected in their apps. These batch emails provide a digest of flagged activity so you can stay informed without being overwhelmed.

How Notifications Work

  • Organization admins receive a moderation digest summarizing flagged activity across their org’s apps
  • Notifications are batched and sent periodically rather than in real time, so your inbox stays manageable
  • Messages flagged for reasons related to self-harm are always sent to org admins as urgent notifications in real time

What’s Included in a Notification

Each moderation email includes the rationale for why a message was flagged, the app name where the conversation occurred, and a deeplink to review the violative message and surrounding context in Playlab. Digest emails contain a summary of all flagged or moderated messages in the last 72 hours.
Example moderation email notification
At least one admin in your organization must be designated to receive safety notifications. This ensures that flagged content is always reviewed and no moderation alerts go unnoticed.
If you received a moderation email and are unsure about a category label, this is expected as the team works to refine category names. Contact your Playlab org admins or reach out to support@playlab.ai with any questions.

Safety Notification Designation

Organizations can now designate admins to receive safety notifications at the org level. Safety notifications make admins aware of moderated and flagged content across the organization, giving your team a clear point of responsibility for reviewing safety-related activity. At least one admin at your organization must be designated to receive safety notifications at all times. These notifications are designed specifically for the people in your organization responsible for student safety oversight.

Flag Visibility and Acknowledgement

The safety team is building toward a system where flagged content is not only visible but actionable. Upcoming improvements include:
  • Flag visibility at the workspace and member level so admins can see what needs attention across their organization
  • Acknowledge and mark as seen functionality that lets admins indicate they have reviewed a flag while preserving a full audit trail
  • Insights views that surface patterns and trends in flagged content, starting with simple metrics and expanding over time
These features are informed by educator feedback and are being prioritized for the next development cycle as part of the Workspaces V2 work.
Flag acknowledgement and insights views are currently in development and not yet available. This page will be updated as these features ship.

Frequently Asked Questions

Contact your organization’s Playlab admin or reach out to support@playlab.ai to be designated to receive safety notifications for your organization.
Moderation notifications are enabled by default for org admins and workspace owners. If you need to adjust notification settings, contact support@playlab.ai.
Playlab uses a set of moderation categories to classify flagged content. These categories are being refined for clarity and accuracy. If you see a category label that is confusing, please let us know.
Review the flagged conversation in your Playlab activity view. If the content was correctly flagged, consider updating your app’s instructions or guardrails or addressing the incident with the app creator and user(s) involved. If it appears to be incorrect moderation, no action is needed, but reporting it helps improve the system. You can report this in your Playlab activity view.
No. Playlab’s moderation and notification tools are designed to complement your existing safety protocols, not replace them. Playlab focuses on surfacing what needs attention so your team can follow your own established procedures.

We want your feedback! These safety features are actively evolving based on educator input. If you have ideas, questions, or feedback on how moderation and notifications are working for your organization, reach out to us.
Contact us: support@playlab.ai Last updated: April 2026